feat!: make voice receivers return AudioPacket classes instead of just Audio#11432
feat!: make voice receivers return AudioPacket classes instead of just Audio#11432petergeneric wants to merge 12 commits intodiscordjs:mainfrom
Conversation
…nd SSRC. Refactored so that instead of pure Buffer, we now send AudioPacket (interface extending Buffer) which has readonly fields sequence, timestamp, and ssrc.
…e, timestamp, ssrc)
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
📝 WalkthroughWalkthroughThe changes extend audio packet handling in the voice receiver to include RTP metadata (sequence number, timestamp, and SSRC). A new public Changes
Sequence DiagramsequenceDiagram
actor UDP as UDP Source
participant VR as VoiceReceiver
participant CAP as createAudioPacket
participant ARS as AudioReceiveStream
UDP->>VR: onUdpMessage (encrypted packet)
Note over VR: Extract RTP sequence,<br/>timestamp from bytes
VR->>VR: Decrypt packet payload
VR->>CAP: createAudioPacket(buffer,<br/>sequence, timestamp, ssrc)
CAP->>CAP: Attach metadata as<br/>non-enumerable properties
CAP-->>VR: AudioPacket (Buffer + metadata)
VR->>ARS: stream.push(AudioPacket)
Note over ARS: Consumer receives<br/>Buffer with metadata
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/voice/src/receive/AudioReceiveStream.ts`:
- Around line 39-41: Update the documentation for the AudioPacket buffer to
state it contains an Opus-encoded payload (with RTP header metadata) rather than
a "decoded Opus packet"; locate the AudioPacket type/comment in
AudioReceiveStream (or the AudioPacket JSDoc) and change the wording to
explicitly say "a Buffer containing an Opus-encoded payload with RTP header
metadata" so API consumers aren't misled.
In `@packages/voice/src/receive/VoiceReceiver.ts`:
- Around line 177-180: The RTP header reads in VoiceReceiver (variables
sequence, timestamp, ssrc reading from msg) can throw for 9–11 byte buffers;
change the early length guard to require at least 12 bytes (e.g., if (msg.length
< 12) return;) or move these reads inside the existing try that wraps
parsePacket so any RangeError is caught; update the check or relocate the reads
in the VoiceReceiver.ts function that computes sequence/timestamp/ssrc to ensure
no unhandled RangeError occurs.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
packages/voice/__tests__/VoiceReceiver.test.tspackages/voice/src/receive/AudioReceiveStream.tspackages/voice/src/receive/VoiceReceiver.ts
📜 Review details
🧰 Additional context used
🧬 Code graph analysis (2)
packages/voice/src/receive/VoiceReceiver.ts (1)
packages/voice/src/receive/AudioReceiveStream.ts (1)
AudioPacket(42-59)
packages/voice/__tests__/VoiceReceiver.test.ts (1)
packages/voice/__mocks__/rtp.ts (3)
RTP_PACKET_DESKTOP(7-16)RTP_PACKET_CHROME(18-26)RTP_PACKET_ANDROID(28-37)
🔇 Additional comments (2)
packages/voice/src/receive/VoiceReceiver.ts (1)
23-30:createAudioPacketwrapper is clean and backward-compatible.Non-enumerable readonly metadata on
Bufferis a good approach for preserving legacy buffer behavior.packages/voice/__tests__/VoiceReceiver.test.ts (1)
71-110: Great coverage additions for metadata passthrough and compatibility.These tests validate both RTP header field extraction and Buffer backward compatibility across multiple packet variants.
- Improve docstring use (also moved method to be private static to be more in-line with rest of code, and improved clarity of naming) - Fix pre-existing issue (min packet length was 8 bytes, but was expecting reading a uint32 at offset 8, so actual min length is 12) - Fix AudioPacket description
There was a problem hiding this comment.
Since you're already touching RTP headers I noticed that this is no longer correct. We have the full packet here, so this should check the eXtension bit (4th bit of the first byte) of the header instead of looking for magic bytes of the header extension data. And it even looks like this is stripping the wrong amount of header data, unless I'm missing something here.
There was a problem hiding this comment.
Yes that's right, it'll need to consider the case where the CC field is > 0.
I've been working on this. Started as using the X bit but it's now a larger change (parsePacket now parses RTP Header fields, and passes payload start pos into decrypt rather than it recomputing. As a result the unit tests call parsePacket rather than directly calling decrypt; I also took the opportunity to change some buffer variable names for better clarity). See 7575a25
Are you happy with that being part of this PR, or should it be split out into a separate PR?
…r.ts Co-authored-by: Qjuh <76154676+Qjuh@users.noreply.github.com>
…d fixing style to use hyphens. Also move addPacketHeaders to a bare function below the class per review comment
Minor naming changes as part of this because we are working with two Buffers (raw RTP packet, decrypted RTP/DAVE payload). Change tests in VoiceReceiver.test.ts that were directly testing `decrypt` to instead test `parsePacket`
…er than an extension of Buffer. This breaks backwards-compatibility for existing AudioReceiveStream users, but is cleaner and allows for future extensibility.
… Documented constructor as not a public interface to discourage end-users from using it and breaking if we extend AudioPacket in the future
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11432 +/- ##
==========================================
+ Coverage 31.66% 31.70% +0.03%
==========================================
Files 386 386
Lines 13966 13977 +11
Branches 1098 1099 +1
==========================================
+ Hits 4422 4431 +9
- Misses 9410 9412 +2
Partials 134 134
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
FYI: code coverage fail is misleading, this is existing lack of coverage (these lines are just reorganisations of the code that was already there to make it easier to understand). It looks like there's no test data for DAVE |
|
For testing sake, can you draft up a quick code sample that can be used to see how it works now? (bonus point, we can document the breaking change) |
|
@vladfrangu Sure, here's a sample snippet that'll join a voice channel and subscribe to all speakers, is this what you were thinking of? import type { VoiceBasedChannel } from 'discord.js';
import {
joinVoiceChannel,
entersState,
VoiceConnectionStatus,
EndBehaviorType,
type AudioPacket,
} from '@discordjs/voice';
async function joinAndRecordChannel(channel: VoiceBasedChannel) {
// Join the voice channel
const connection = joinVoiceChannel({
channelId: channel.id,
guildId: channel.guild.id,
adapterCreator: channel.guild.voiceAdapterCreator,
selfDeaf: false, // We want to receive audio
selfMute: true, // We don't want to send audio
});
// Wait for the connection to be ready
await entersState(connection, VoiceConnectionStatus.Ready, 30_000);
const receiver = connection.receiver;
// Subscribe to all users when they speak
const subscribedUsers = new Set<string>();
receiver.speaking.on('start', (userId) => {
if (subscribedUsers.has(userId)) return;
subscribedUsers.add(userId);
// Subscribe to the user's audio stream
const audioStream = receiver.subscribe(userId, {
end: { behavior: EndBehaviorType.Manual },
});
// New API: 'data' event now emits an AudioPacket rather than a Buffer
audioStream.on('data', (packet: AudioPacket) => handleOpusPacket(userId, packet));
audioStream.on('error', (err) => {
console.error(`Audio stream error for ${userId}:`, err);
});
});
}
function handleOpusPacket(userId: string, packet: AudioPacket) {
console.log(
`User ${userId}: rtp.ssrc=${packet.ssrc} rtp.seq=${packet.sequence} rtp.ts=${packet.timestamp}@48kHz opus=${packet.payload.length}B`,
);
}(a better example might show how to use the new fields in opus decoding to produce a fully synchronised PCM from all speakers, but I've extracted this from my transcription bot which defers opus decode to an offline process to minimise load while recording) Here's an initial draft of the breaking change doc entry (have tried to match general style from the Changes in v14 .mdx): AudioReceiveStream
The Opus audio data can be found in the - // Previously, each chunk was a raw Buffer of Opus data:
- receiver.subscribe(userId).on('data', (chunk) => {
- processOpusData(userId, chunk); // chunk was a Buffer
- });
+ // Now, each chunk is an AudioPacket with payload and RTP metadata:
+ receiver.subscribe(userId).on('data', (packet: AudioPacket) => {
+ processOpusData(userId, packet.payload); // Opus data is on .payload
+ console.log(packet.sequence); // RTP sequence number
+ console.log(packet.timestamp); // RTP timestamp (48kHz clock)
+ console.log(packet.ssrc); // RTP synchronization source
+ }); |
|
That is perfect, thanks! |
Currently, RTP packet headers are stripped and clients are only delivered Opus frames as
Buffers. The lack of timestamp makes jitter/drift hard to combat for clients, who must compute wall-clock delivery time.This change introduces an
AudioPacketinterface which extendsBufferwith the following read-only fields:sequence(16-bit uint monotonically increasing counter to allow for identifying out-of-order RTP packets)timestamp(32-bit uint that counts encoder-side timestamps; RFC 7587, Opus in RTP requires this be expressed at 48kHz no matter the audio sample rate)ssrc(to allow consumers to detect a change and reset their Opus decoder state; updates on this value can already be received via SSRCMap events, however I think it also belongs on AudioPacket because there's a risk of delayed SSRCMap event delivery, and the client needs to know as soon as they receive a packet that changes it so they can reset their decoder state to correctly parse the packet)This change deliberately does not attempt to provide a generalised parser for all the fields in the RTP Header to keep this PR small and focused just on the essential fields.